A Novel Approach to Mining Maximal Frequent Itemsets Based on Genetic Algorithm
نویسندگان
چکیده
We present a new approach based on Genetic Algorithm to generate maximal frequent itemsets from large databases. This new algorithm called GeneticMax is heuristic which mimics natural selection approaches to finding maximal frequent itemsets in an efficient way. The search strategy of this algorithm uses lexicographic tree that avoids level by level searching, which finally reduces the time required to mine maximal frequent itemsets in a linear way. Our implementation of the search strategy includes bitmap representation of the nodes in a lexicographic tree and from superset-subset relationship of the nodes it identifies frequent itemsets. Since this new algorithm uses the principles of Genetic Algorithm, it performs global search and its time complexity is less than that of other algorithms, for the reason that genetic algorithm is based on greedy approach. We separate the effect of each step of this algorithm by experimental analysis on real databases including Tic Tac Toe, Zoo, a 10000×8 Database, and so on. Our experimental results show that this approach is efficient and scalable for different sizes of itemsets. It accesses a major database to calculate a support value for fewer number of nodes to find frequent itemsets even when the search space is very large, which dramatically reduces the search time.
منابع مشابه
Maximal frequent itemset generation using segmentation approach
Finding frequent itemsets in a data source is a fundamental operation behind Association Rule Mining. Generally, many algorithms use either the bottom-up or top-down approaches for finding these frequent itemsets. When the length of frequent itemsets to be found is large, the traditional algorithms find all the frequent itemsets from 1-length to n-length, which is a difficult process. This prob...
متن کاملData sanitization in association rule mining based on impact factor
Data sanitization is a process that is used to promote the sharing of transactional databases among organizations and businesses, it alleviates concerns for individuals and organizations regarding the disclosure of sensitive patterns. It transforms the source database into a released database so that counterparts cannot discover the sensitive patterns and so data confidentiality is preserved ag...
متن کاملA Matrix based Maximal Frequent Itemset Mining Algorithm without Subset Creation
Frequent pattern mining is main step in association rule mining. Several algorithms have been proposed for this, but the majority of these algorithms have two main problems that is large number of database scan and generating large candidate itemsets. This process is time intense because these algorithms first mine the minimal frequent itemsets and then generate maximal frequent itemsets from m...
متن کاملMaximal Frequent Itemsets Mining Using Database Encoding
Frequent itemsets mining is a classic problem in data mining and plays an important role in data mining research for over a decade. However, the mining of the all frequent itemsets will lead to a massive number of itemsets. Fortunately, this problem can be reduced to the mining of maximal frequent itemsets. In this paper, we propose a new method for mining maximal frequent itemsets. Our method ...
متن کاملAn Improved Mining Algorithm of Maximal Frequent Itemsets
Mining maximal frequent itemsets is very important in many data mining applications. How to improve the efficiency and effectiveness of mining algorithm has become an interesting issue in the world. In this paper, we introduce a new method to solve this problem, which is based on graph theory. Firstly, the concept of directed itemsets graph and the trifurcate linked list storage structure are p...
متن کامل